margin distribution
- North America > Canada (0.04)
- Europe > Denmark (0.04)
- Oceania > Australia > Queensland (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (2 more...)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > Canada (0.04)
1ea97de85eb634d580161c603422437f-Supplemental.pdf
Supplementary material: Hold me tight! A Theoretical margin distribution of a linear classifier 2 B Examples of frequency "flipped" images 4 C Invariance and elasticity on MNIST data 4 D Connections to catastrophic forgetting 5 E Examples of filtered images 6 F Subspace sampling of the DCT 6 G Training parameters 7 H Cross-dataset performance 8 I Margin distribution for standard networks 9 J Adversarial training parameters 13 K Description of L2-PGD attack on frequency "flipped" data 14 L Spectral decomposition on frequency "flipped" data 15 M Margin distribution for adversarially trained networks 16 N Margin distribution on random subspaces 19 We demonstrate this effect in practice by repeating the experiment of Sec. MLP we use a simple logistic regression (see Table S1).Clearly, although the values along Figure S1 shows a few example images of the frequency "flipped" versions of the standard computer We further validate our observation of Section 3.2.2 that small margin do indeed corresponds to After this, we continue training the network with a linearly decaying learning rate (max. Figure S4: Filtered image examples. Table S2 shows the performance and training parameters of the different networks used in the paper.
- North America > Canada (0.04)
- Europe > Denmark (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Denmark (0.04)
Structure-Preserving Margin Distribution Learning for High-Order Tensor Data with Low-Rank Decomposition
Xu, Yang, Li, Junpeng, Hua, Changchun, Yang, Yana
Abstract--The Large Margin Distribution Machine (LMDM) is a recent advancement in classifier design that optimizes not just the minimum margin (as in SVM) but the entire margin distribution, thereby improving generalization. However, existing LMDM formulations are limited to vectorized inputs and struggle with high-dimensional tensor data due to the need for flattening, which destroys the data's inherent multi-mode structure and increases computational burden. In this paper, we propose a Structure-Preserving Margin Distribution Learning for High-Order T ensor Data with Low-Rank Decomposition (SPMD-LRT) that operates directly on tensor representations without vectorization. The SPMD-LRT preserves multi-dimensional spatial structure by incorporating first-order and second-order tensor statistics (margin mean and variance) into the objective, and it leverages low-rank tensor decomposition techniques including rank-1(CP), higher-rank CP, and T ucker decomposition to parameterize the weight tensor . An alternating optimization (double-gradient descent) algorithm is developed to efficiently solve the SPMD-LRT, iteratively updating factor matrices and core tensor . This approach enables SPMD-LRT to maintain the structural information of high-order data while optimizing margin distribution for improved classification. Extensive experiments on diverse datasets (including MNIST, images and fMRI neuroimaging) demonstrate that SPMD-LRT achieves superior classification accuracy compared to conventional SVM, vector-based LMDM, and prior tensor-based SVM extensions (Support T ensor Machines and Support T ucker Machines). These results confirm the effectiveness and robustness of SPMD-LRT in handling high-dimensional tensor data for classification. Dvances in data acquisition have led to an abundance of high-order tensor data (multi-dimensional arrays) across various domains, such as video sequences, medical imaging, and spatiotemporal sensor readings. Effectively learning from such tensor-structured data has become a pressing research focus [1] [2]. The multi-dimensional structure of tensors offers rich information (e.g.
- Europe > Sweden > Uppsala County > Uppsala (0.04)
- Asia > China (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Implicit bias produces neural scaling laws in learning curves, from perceptrons to deep networks
D'Amico, Francesco, Bocchi, Dario, Negri, Matteo
Scaling laws in deep learning - empirical power-law relationships linking model performance to resource growth - have emerged as simple yet striking regularities across architectures, datasets, and tasks. These laws are particularly impactful in guiding the design of state-of-the-art models, since they quantify the benefits of increasing data or model size, and hint at the foundations of interpretability in machine learning. However, most studies focus on asymptotic behavior at the end of training or on the optimal training time given the model size. In this work, we uncover a richer picture by analyzing the entire training dynamics through the lens of spectral complexity norms. We identify two novel dynamical scaling laws that govern how performance evolves during training. These laws together recover the well-known test error scaling at convergence, offering a mechanistic explanation of generalization emergence. Our findings are consistent across CNNs, ResNets, and Vision Transformers trained on MNIST, CIFAR-10 and CIFAR-100. Furthermore, we provide analytical support using a solvable model: a single-layer perceptron trained with binary cross-entropy. In this setting, we show that the growth of spectral complexity driven by the implicit bias mirrors the generalization behavior observed at fixed norm, allowing us to connect the performance dynamics to classical learning rules in the perceptron.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)